SIMD code generation in data-parallel programming
نویسنده
چکیده
Today’s desktop PCs feature a variety of parallel processing units. Developing applications that exploit this parallelism is a demanding task, and a programmer has to obtain detailed knowledge about the hardware for efficient implementation. CGiS is a data-parallel programming language providing a unified abstraction for two parallel processing units: graphics processing units (GPUs) and the vector processing units of CPUs. The CGiS compiler framework fully virtualizes the differences in capability and accessibility by mapping an abstract data-parallel programming model on those targets. The applicability of CGiS for GPUs has been shown in previous work; this work focuses on applying the abstract programming model of CGiS to CPUs with SIMD (Single Instruction Multiple Data) instruction sets. We have identified, adapted and implemented a set of program analyses to expose and access the available parallelism. The code generation phase is based on selected optimization algorithms tailored to SIMD code generation. Via code generation profiles, it is possible to adapt the code generation strategy to different target architectures. To assess the effectiveness of our approach, we have implemented backends for the two most widespread SIMD instruction sets, namely Intel’s Streaming SIMD Extensions and Freescale’s AltiVec. Additionally, we integrated a prototypical backend for the Cell Broadband Engine as an example for a multi-core architecture. Our experimental results show excellent average performance gains by a factor of 3 compared to standard scalar C++ implementations and underline the viability of this approach: real-world applications can be implemented easily with CGiS and result in efficient code.
منابع مشابه
Verified Code Generation for Embedded Systems
Digital signal processors provide specialized SIMD (single instruction multiple data) operations designed to dramatically increase performance in embedded systems. While these operations are simple to understand, their unusual functions and their parallelism make it difficult for automatic code generation algorithms to use them effectively. In this paper, we present a new optimizing code genera...
متن کاملPack Instruction Generation for Media Processors Using Multi-valued Decision Diagram
SIMD instructions are often implemented in modern multimedia oriented processors. Although SIMD instructions are useful for many digital signal processing applications, most compilers do not exploit SIMD instructions. The difficulty in the utilization of SIMD instructions stems from data parallelism in registers. In assembly code generation, the positions of data in registers must be noted. A t...
متن کاملParallel Generation of t-ary Trees
A parallel algorithm for generating t-ary tree sequences in reverse B-order is presented. The algorithm generates t-ary trees by 0-1 sequences, and each 0-1 sequences is generated in constant average time O(1). The algorithm is executed on a CREW SM SIMD model, and is adaptive and cost-optimal. Prior to the discussion of the parallel algorithm a new sequential generation with O(1) average time ...
متن کاملA Comparative Study of the Programmability of a Signal Processing Application in an MIMD and an SIMD Multiprocessor
In this report, we address the issues of compilation and execution of a functional program, SISAL (Streams and Iterations in a Single Assignment Language), on the MP-1TM SIMD (Single Instructionstream Multiple Data-stream) parallel machine. SISAL has been successful on many shared memory multiprocessors (SMM) as well as sequential machines. However, the compiler has not been available for distr...
متن کاملPerformance Evaluation of Parallel Simd
A simulator for SIMD type architectures is presented. Starting from an architecture independent algorithm description based on recurrence equations, transformation steps for automatic parallelization, mapping and code generation are outlined. The nal pseudo code program together with architecture dependent parameters and execution time tables, are fed into the simulator in order to gain perform...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009